Our client is seeking an experienced Site Reliability Engineer (SRE) to support a growing SaaS platform built on Microsoft Azure, with a strong emphasis on Azure PaaS services. This role focuses on core SRE responsibilities—reliability, availability, automation, monitoring, incident response, and continuous improvement across a modern cloud environment.
Ensure the reliability, scalability, and performance of a cloud-based SaaS platform.
Implement and maintain Azure PaaS solutions, IaC, automation, and monitoring frameworks.
Develop automation scripts (PowerShell) and manage Azure-based infrastructure.
Configure and operate monitoring/alerting using tools such as Azure Monitor, Grafana, Prometheus, or Datadog.
Lead incident response, root-cause analysis, and post-incident improvement actions.
Support capacity planning, optimisation, and auto-scaling strategies.
Contribute to security, compliance, and operational best practices.
Proven SRE experience in a cloud-hosted SaaS environment.
Strong, hands-on expertise with Microsoft Azure, including compute, storage, networking, monitoring, and Azure PaaS services.
Proficiency with automation and IaC (ARM, Bicep, Terraform).
Experience with containers (Docker, Kubernetes) and CI/CD pipelines.
Strong incident management, troubleshooting, and problem-solving capability.
Excellent communication and collaboration skills.
Azure certifications (Administrator, Architect).
Experience with microservices, Agile environments, and Azure SQL / Cosmos DB.
Familiarity with regulated industries or enterprise-grade compliance frameworks.
Competitive salary and benefits package.
Hybrid working and career development in a growing technology environment.
Reperio Human Capital acts as an Employment Agency and an Employment Business.